library("knitr") # for knitting RMarkdown
library("kableExtra") # for making nice tables
library("janitor") # for cleaning column names
library("broom") # for tidying up linear models
library("patchwork") # for making figure panels
library("lme4") # for linear mixed effects models
library("modelr") # for bootstrapping
library("boot") # also for bootstrapping
library("tidyverse") # for wrangling, plotting, etc.
# include references for used packages
knitr::write_bib(.packages(), "packages.bib")
theme_set(
theme_classic() + #set the theme
theme(text = element_text(size = 20)) #set the default text size
)
# load sleepstudy data set
df.sleep = sleepstudy %>%
as_tibble() %>%
clean_names() %>%
mutate(subject = as.character(subject)) %>%
select(subject, days, reaction)
# add two fake participants (with missing data)
df.sleep = df.sleep %>%
bind_rows(
tibble(subject = "374",
days = 0:1,
reaction = c(286, 288)),
tibble(subject = "373",
days = 0,
reaction = 245)
)
Some code to draw a t-distribution:
tibble(x = c(-4, 4)) %>%
ggplot(data = .,
mapping = aes(x = x)) +
stat_function(fun = "dt",
args = list(df = 20),
size = 1,
geom = "area",
fill = "red",
# xlim = c(qt(0.95, df = 20), qt(0.999, df = 20))) +
# xlim = c(qt(0.001, df = 20), qt(0.05, df = 20))) +
xlim = c(qt(0.001, df = 20), qt(0.025, df = 20))) +
stat_function(fun = "dt",
args = list(df = 20),
size = 1,
geom = "area",
fill = "red",
xlim = c(qt(0.975, df = 20), qt(0.999, df = 20))) +
stat_function(fun = "dt",
args = list(df = 20),
size = 1) +
coord_cartesian(expand = F)
Some code to draw an F-distribution
tibble(x = c(0, 5)) %>%
ggplot(data = .,
mapping = aes(x = x)) +
stat_function(fun = "df",
args = list(df1 = 100, df2 = 10),
size = 1,
geom = "area",
fill = "red",
xlim = c(qf(0.95, df1 = 100, df2 = 10), qf(0.999, df1 = 100, df2 = 10))) +
stat_function(fun = "df",
args = list(df1 = 100, df2 = 10),
size = 1) +
coord_cartesian(expand = F)
What if we have groups of participants who differ from each other? Let’s generate data for which this is the case.
# make example reproducible
set.seed(1)
sample_size = 20
b0 = 1
b1 = 2
sd_residual = 0.5
sd_participant = 0.5
mean_group1 = 1
mean_group2 = 10
df.mixed = tibble(
condition = rep(0:1, each = sample_size),
participant = rep(1:sample_size, 2)) %>%
group_by(participant) %>%
mutate(group = sample(1:2, size = 1),
intercept = ifelse(group == 1,
rnorm(n(), mean = mean_group1, sd = sd_participant),
rnorm(n(), mean = mean_group2, sd = sd_participant))) %>%
group_by(condition) %>%
mutate(value = b0 + b1 * condition + intercept + rnorm(n(), sd = sd_residual)) %>%
ungroup %>%
mutate(condition = as.factor(condition),
participant = as.factor(participant))
Let’ first fit a model that ignores the fact that there are two different groups of participatns.
# fit model
fit.mixed = lmer(formula = value ~ 1 + condition + (1 | participant),
data = df.mixed)
fit.mixed %>% summary()
Linear mixed model fit by REML ['lmerMod']
Formula: value ~ 1 + condition + (1 | participant)
Data: df.mixed
REML criterion at convergence: 165.6
Scaled residuals:
Min 1Q Median 3Q Max
-1.6437 -0.4510 -0.0246 0.4987 1.5265
Random effects:
Groups Name Variance Std.Dev.
participant (Intercept) 21.5142 4.6383
Residual 0.3521 0.5934
Number of obs: 40, groups: participant, 20
Fixed effects:
Estimate Std. Error t value
(Intercept) 7.2229 1.0456 6.908
condition1 1.6652 0.1876 8.875
Correlation of Fixed Effects:
(Intr)
condition1 -0.090
Let’s look at the model’s predictions:
fit.mixed %>%
augment() %>%
clean_names() %>%
ggplot(data = .,
mapping = aes(x = condition,
y = value,
group = participant)) +
geom_point(alpha = 0.5) +
geom_line(alpha = 0.5) +
geom_point(aes(y = fitted),
color = "red") +
geom_line(aes(y = fitted),
color = "red")
And let’s simulate some data from the fitted model:
# simulated data
fit.mixed %>%
simulate() %>%
bind_cols(df.mixed) %>%
ggplot(data = .,
mapping = aes(x = condition,
y = sim_1,
group = participant)) +
geom_line(alpha = 0.5) +
geom_point(alpha = 0.5)
As we can see, the simulated data doesn’t look like the data that was used to fit the model.
Now, let’s fit a model that takes the differences between groups into account by adding a fixed effect for group.
# fit model
fit.grouped = lmer(formula = value ~ 1 + group + condition + (1 | participant),
data = df.mixed)
fit.grouped %>% summary()
Linear mixed model fit by REML ['lmerMod']
Formula: value ~ 1 + group + condition + (1 | participant)
Data: df.mixed
REML criterion at convergence: 83.7
Scaled residuals:
Min 1Q Median 3Q Max
-1.56168 -0.69876 0.05887 0.50419 2.30259
Random effects:
Groups Name Variance Std.Dev.
participant (Intercept) 0.1147 0.3387
Residual 0.3521 0.5934
Number of obs: 40, groups: participant, 20
Fixed effects:
Estimate Std. Error t value
(Intercept) -6.8299 0.4055 -16.842
group 9.0663 0.2424 37.409
condition1 1.6652 0.1876 8.875
Correlation of Fixed Effects:
(Intr) group
group -0.926
condition1 -0.231 0.000
Note how the variance of the random intercepts is much smaller now that we’ve taken the group structure in the data into account.
Let’s visualize the model’s predictions:
fit.grouped %>%
augment() %>%
clean_names() %>%
ggplot(data = .,
mapping = aes(x = condition,
y = value,
group = participant)) +
geom_point(alpha = 0.5) +
geom_line(alpha = 0.5) +
geom_point(aes(y = fitted),
color = "red") +
geom_line(aes(y = fitted),
color = "red")
And simulate some data from the model:
# simulated data
fit.grouped %>%
simulate() %>%
bind_cols(df.mixed) %>%
ggplot(data = .,
mapping = aes(x = condition,
y = sim_1,
group = participant)) +
geom_line(alpha = 0.5) +
geom_point(alpha = 0.5)
This time, the simulated data looks much more like the data that was used to fit the model. Yay!
The example above has shown that we can take overall differences between groups into account by adding a fixed effect. Can we also deal with heterogeneity in variance between groups? For example, what if the responses of one group exhibit much more variance than the responses of another group?
Let’s first generate some data with heterogeneous variance:
# make example reproducible
set.seed(1)
sample_size = 20
b0 = 1
b1 = 2
sd_residual = 0.5
mean_group1 = 1
sd_group1 = 1
mean_group2 = 30
sd_group2 = 10
df.variance = tibble(
condition = rep(0:1, each = sample_size),
participant = rep(1:sample_size, 2)) %>%
group_by(participant) %>%
mutate(group = sample(1:2, size = 1),
intercept = ifelse(group == 1,
rnorm(n(), mean = mean_group1, sd = sd_group1),
rnorm(n(), mean = mean_group2, sd = sd_group2))) %>%
group_by(condition) %>%
mutate(value = b0 + b1 * condition + intercept + rnorm(n(), sd = sd_residual)) %>%
ungroup %>%
mutate(condition = as.factor(condition),
participant = as.factor(participant))
Let’s fit the model:
# fit model
fit.variance = lmer(formula = value ~ 1 + group + condition + (1 | participant),
data = df.variance)
fit.variance %>% summary()
Linear mixed model fit by REML ['lmerMod']
Formula: value ~ 1 + group + condition + (1 | participant)
Data: df.variance
REML criterion at convergence: 250
Scaled residuals:
Min 1Q Median 3Q Max
-2.70344 -0.21278 0.07355 0.43873 1.39493
Random effects:
Groups Name Variance Std.Dev.
participant (Intercept) 17.60 4.196
Residual 26.72 5.169
Number of obs: 40, groups: participant, 20
Fixed effects:
Estimate Std. Error t value
(Intercept) -26.5805 4.1525 -6.401
group 29.6200 2.5010 11.843
condition1 0.1853 1.6346 0.113
Correlation of Fixed Effects:
(Intr) group
group -0.934
condition1 -0.197 0.000
Look at the data and model predictions:
fit.variance %>%
augment() %>%
clean_names() %>%
ggplot(data = .,
mapping = aes(x = condition,
y = value,
group = participant)) +
geom_point(alpha = 0.5) +
geom_line(alpha = 0.5) +
geom_point(aes(y = fitted),
color = "red") +
geom_line(aes(y = fitted),
color = "red")
And the simulated data:
# simulated data
fit.mixed %>%
simulate() %>%
bind_cols(df.mixed) %>%
ggplot(data = .,
mapping = aes(x = condition,
y = sim_1,
group = participant)) +
geom_line(alpha = 0.5) +
geom_point(alpha = 0.5)
The lmer() fails here. It uses one normal distribution to model the variance between participants. It cannot account for the fact that the answers of one groups of participants vary more than the answers from another groups of participants. Again, the simulated data doesn’t look the original data, even though we did take the grouping into account.
Let’s illustrate the concept of pooling and shrinkage via the sleep data set that comes with the lmer package. We’ve already loaded the data set into our environment as df.sleep.
Let’s start by visualizing the data
# visualize the data
ggplot(data = df.sleep,
mapping = aes(x = days, y = reaction)) +
geom_point() +
facet_wrap(~subject, ncol = 5) +
labs(x = "Days of sleep deprivation",
y = "Average reaction time (ms)") +
scale_x_continuous(breaks = 0:4 * 2) +
theme(strip.text = element_text(size = 12),
axis.text.y = element_text(size = 12))
The plot shows the effect of the number of days of sleep deprivation on the average reaction time (presumably in an experiment). Note that for participant 373 and 374 we only have one and two data points respectively.
Let’s first fit a model the simply combines all the data points. This model ignores the dependence structure in the data (i.e. the fact that we have repeated observations from the same participants).
fit.complete = lm(formula = reaction ~ days,
data = df.sleep)
fit.params = tidy(fit.complete)
fit.complete %>%
summary()
Call:
lm(formula = reaction ~ days, data = df.sleep)
Residuals:
Min 1Q Median 3Q Max
-110.646 -27.951 1.829 26.388 139.875
Coefficients:
Estimate Std. Error t value Pr(>|t|)
(Intercept) 252.321 6.406 39.389 < 2e-16 ***
days 10.328 1.210 8.537 5.48e-15 ***
---
Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
Residual standard error: 47.43 on 181 degrees of freedom
Multiple R-squared: 0.2871, Adjusted R-squared: 0.2831
F-statistic: 72.88 on 1 and 181 DF, p-value: 5.484e-15
And let’s visualize the predictions of this model.
# visualization (aggregate)
ggplot(data = df.sleep,
mapping = aes(x = days, y = reaction)) +
geom_abline(intercept = fit.params$estimate[1],
slope = fit.params$estimate[2],
color = "blue") +
geom_point() +
labs(x = "Days of sleep deprivation",
y = "Average reaction time (ms)") +
scale_x_continuous(breaks = 0:4 * 2) +
theme(strip.text = element_text(size = 12),
axis.text.y = element_text(size = 12))
And here is what the model’s predictions look like separated by participant.
# visualization (separate participants)
ggplot(data = df.sleep,
mapping = aes(x = days, y = reaction)) +
geom_abline(intercept = fit.params$estimate[1],
slope = fit.params$estimate[2],
color = "blue") +
geom_point() +
facet_wrap(~subject, ncol = 5) +
labs(x = "Days of sleep deprivation",
y = "Average reaction time (ms)") +
scale_x_continuous(breaks = 0:4 * 2) +
theme(strip.text = element_text(size = 12),
axis.text.y = element_text(size = 12))
The model predicts the same relationship between sleep deprivation and reaction time for each participant (not surprising since we didn’t even tell the model that this data is based on different participants).
We could also fit separate regressions for each participant. Let’s do that.
# fit regressions and extract parameter estimates
df.no_pooling = df.sleep %>%
group_by(subject) %>%
nest(days, reaction) %>%
mutate(fit = map(data, ~ lm(reaction ~ days, data = .)),
params = map(fit, tidy)) %>%
unnest(params) %>%
select(subject, term, estimate) %>%
complete(subject, term, fill = list(estimate = 0)) %>%
spread(term, estimate) %>%
clean_names()
And let’s visualize what the predictions of these separate regressions would look like:
ggplot(data = df.sleep,
mapping = aes(x = days,
y = reaction)) +
geom_abline(data = df.no_pooling %>%
filter(subject != 373),
aes(intercept = intercept,
slope = days),
color = "blue") +
geom_point() +
facet_wrap(~subject, ncol = 5) +
labs(x = "Days of sleep deprivation",
y = "Average reaction time (ms)") +
scale_x_continuous(breaks = 0:4 * 2) +
theme(strip.text = element_text(size = 12),
axis.text.y = element_text(size = 12))
When we fit separate regression, no information is shared between participants.
By usign linear mixed effects models, we are partially pooling information. That is, the estimates for one participant are influenced by the rest of the participants.
We’ll fit a number of mixed effects models that differ in their random effects structure.
This model allows for random differences in the intercepts and slopes between subjects (and also models the correlation between intercepts and slopes).
Let’s fit the model
fit.random_intercept_slope = lmer(formula = reaction ~ 1 + days + (1 + days | subject),
data = df.sleep)
and take a look at the model’s predictions:
fit.random_intercept_slope %>%
augment() %>%
clean_names() %>%
ggplot(data = .,
mapping = aes(x = days,
y = reaction)) +
geom_line(aes(y = fitted),
color = "blue") +
geom_point() +
facet_wrap(~subject, ncol = 5) +
labs(x = "Days of sleep deprivation",
y = "Average reaction time (ms)") +
scale_x_continuous(breaks = 0:4 * 2) +
theme(strip.text = element_text(size = 12),
axis.text.y = element_text(size = 12))
geom_path: Each group consists of only one observation. Do you need to
adjust the group aesthetic?
As we can see, the lines for each participant are different. We’ve allowed for the intercept as well as the relationship between sleep deprivation and reaction time to be different between participants.
Let’s fit a model that only allows for the intercepts to vary between participants.
fit.random_intercept = lmer(formula = reaction ~ 1 + days + (1 | subject),
data = df.sleep)
And let’s visualize what these predictions look like:
fit.random_intercept %>%
augment() %>%
clean_names() %>%
ggplot(data = .,
mapping = aes(x = days,
y = reaction)) +
geom_line(aes(y = fitted),
color = "blue") +
geom_point() +
facet_wrap(~subject, ncol = 5) +
labs(x = "Days of sleep deprivation",
y = "Average reaction time (ms)") +
scale_x_continuous(breaks = 0:4 * 2) +
theme(strip.text = element_text(size = 12),
axis.text.y = element_text(size = 12))
geom_path: Each group consists of only one observation. Do you need to
adjust the group aesthetic?
Now, all the lines are parallel but the intercept differs between participants.
Finally, let’s compare a model that only allows for the slopes to differ but not the intercepts.
fit.random_slope = lmer(formula = reaction ~ 1 + days + (0 + days | subject),
data = df.sleep)
And let’s visualize the model fit:
fit.random_slope %>%
augment() %>%
clean_names() %>%
ggplot(data = .,
mapping = aes(x = days,
y = reaction)) +
geom_line(aes(y = fitted),
color = "blue") +
geom_point() +
facet_wrap(~subject, ncol = 5) +
labs(x = "Days of sleep deprivation",
y = "Average reaction time (ms)") +
scale_x_continuous(breaks = 0:4 * 2) +
theme(strip.text = element_text(size = 12),
axis.text.y = element_text(size = 12))
geom_path: Each group consists of only one observation. Do you need to
adjust the group aesthetic?
Here, all the lines have the same starting point (i.e. the same intercept) but the slopes are different.
Let’s compare the results of the different methods – complete pooling, no pooling, and partial pooling (with random intercepts and slopes).
# complete pooling
fit.complete_pooling = lm(formula = reaction ~ days,
data = df.sleep)
df.complete_pooling = fit.complete_pooling %>%
augment() %>%
bind_rows(
fit.complete_pooling %>%
augment(newdata = tibble(subject = c("373", "374"),
days = rep(10, 2)))
) %>%
clean_names() %>%
select(reaction, days, complete_pooling = fitted)
# no pooling
df.no_pooling = df.sleep %>%
group_by(subject) %>%
nest(days, reaction) %>%
mutate(fit = map(data, ~ lm(reaction ~ days, data = .)),
augment = map(fit, augment)) %>%
unnest(augment) %>%
clean_names() %>%
select(subject, reaction, days, no_pooling = fitted)
# partial pooling
fit.lmer = lmer(formula = reaction ~ 1 + days + (1 + days | subject),
data = df.sleep)
df.partial_pooling = fit.lmer %>%
augment() %>%
bind_rows(
fit.lmer %>%
augment(newdata = tibble(subject = c("373", "374"),
days = rep(10, 2)))
) %>%
clean_names() %>%
select(subject, reaction, days, partial_pooling = fitted)
# combine results
df.pooling = df.partial_pooling %>%
left_join(df.complete_pooling) %>%
left_join(df.no_pooling)
Let’s compare the predictions of the different models visually:
ggplot(data = df.pooling,
mapping = aes(x = days,
y = reaction)) +
geom_smooth(method = "lm",
se = F,
color = "orange",
fullrange = T) +
geom_line(aes(y = complete_pooling),
color = "green") +
geom_line(aes(y = partial_pooling),
color = "blue") +
geom_point() +
facet_wrap(~subject, ncol = 5) +
labs(x = "Days of sleep deprivation",
y = "Average reaction time (ms)") +
scale_x_continuous(breaks = 0:4 * 2) +
theme(strip.text = element_text(size = 12),
axis.text.y = element_text(size = 12))
Warning: Removed 4 rows containing non-finite values (stat_smooth).
Warning: Removed 4 rows containing missing values (geom_point).
To better see the differences between the approaches, let’s focus on the predictions for the participants with incomplete data:
# subselection
ggplot(data = df.pooling %>%
filter(subject %in% c("373", "374")),
mapping = aes(x = days,
y = reaction)) +
geom_smooth(method = "lm",
se = F,
color = "orange",
fullrange = T) +
geom_line(aes(y = complete_pooling),
color = "green") +
geom_line(aes(y = partial_pooling),
color = "blue") +
geom_point() +
facet_wrap(~subject) +
labs(x = "Days of sleep deprivation",
y = "Average reaction time (ms)") +
scale_x_continuous(breaks = 0:4 * 2) +
theme(strip.text = element_text(size = 12),
axis.text.y = element_text(size = 12))
Warning: Removed 4 rows containing non-finite values (stat_smooth).
Warning: Removed 4 rows containing missing values (geom_point).
One good way to get a sense for what the different models are doing is by taking a look at the coefficients:
fit.complete_pooling %>%
coef()
(Intercept) days
252.32070 10.32766
fit.random_intercept %>%
coef()
$subject
(Intercept) days
308 292.2749 10.43191
309 174.0559 10.43191
310 188.7454 10.43191
330 256.0247 10.43191
331 261.8141 10.43191
332 259.8262 10.43191
333 268.0765 10.43191
334 248.6471 10.43191
335 206.5096 10.43191
337 323.5643 10.43191
349 230.5114 10.43191
350 265.6957 10.43191
351 243.7988 10.43191
352 287.8850 10.43191
369 258.6454 10.43191
370 245.2931 10.43191
371 248.3508 10.43191
372 269.6861 10.43191
373 248.2086 10.43191
374 273.9400 10.43191
attr(,"class")
[1] "coef.mer"
fit.random_slope %>%
coef()
$subject
(Intercept) days
308 252.2965 19.9526801
309 252.2965 -4.3719650
310 252.2965 -0.9574726
330 252.2965 8.9909957
331 252.2965 10.5394285
332 252.2965 11.3994289
333 252.2965 12.6074020
334 252.2965 10.3413879
335 252.2965 -0.5722073
337 252.2965 24.2246485
349 252.2965 7.7702676
350 252.2965 15.0661415
351 252.2965 7.9675415
352 252.2965 17.0002999
369 252.2965 11.6982767
370 252.2965 11.3939807
371 252.2965 9.4535879
372 252.2965 13.4569059
373 252.2965 10.4142695
374 252.2965 11.9097917
attr(,"class")
[1] "coef.mer"
fit.random_intercept_slope %>%
coef()
$subject
(Intercept) days
308 253.9479 19.6264139
309 211.7328 1.7319567
310 213.1579 4.9061843
330 275.1425 5.6435987
331 273.7286 7.3862680
332 260.6504 10.1632535
333 268.3684 10.2245979
334 244.5523 11.4837825
335 251.3700 -0.3355554
337 286.2321 19.1090061
349 226.7662 11.5531963
350 238.7807 17.0156766
351 256.2344 7.4119501
352 272.3512 13.9920698
369 254.9484 11.2985741
370 226.3701 15.2027922
371 252.5051 9.4335432
372 263.8916 11.7253342
373 248.9752 10.3915245
374 271.1451 11.0782697
attr(,"class")
[1] "coef.mer"
In mixed effects models, the variance of parameter estimates across participants shrinks compared to a no pooling model (where we fit a different regression to each participant). Expressed differently, individual parameter estimates are borrowing strength from the overall data set in mixed effects models.
# get estimates from partial pooling model
df.partial_pooling = fit.random_intercept_slope %>%
coef() %>%
.[[1]] %>%
rownames_to_column("subject") %>%
clean_names()
# combine estimates from no pooling with partial pooling model
df.plot = df.sleep %>%
group_by(subject) %>%
nest(days, reaction) %>%
mutate(fit = map(data, ~ lm(reaction ~ days, data = .)),
tidy = map(fit, tidy)) %>%
unnest(tidy) %>%
select(subject, term, estimate) %>%
spread(term, estimate) %>%
clean_names() %>%
mutate(method = "no pooling") %>%
bind_rows(df.partial_pooling %>%
mutate(method = "partial pooling")) %>%
gather("index", "value", -c(subject, method)) %>%
mutate(index = factor(index, levels = c("intercept", "days")))
# visualize the results
ggplot(data = df.plot,
mapping = aes(x = value,
group = method,
fill = method)) +
stat_density(position = "identity",
geom = "area",
color = "black",
alpha = 0.3) +
facet_grid(cols = vars(index),
scales = "free")
Warning: Removed 1 rows containing non-finite values (stat_density).
Bootstrapping is a good way to estimate our uncertainty on the parameter estimates in the model.
Let’s briefly review how to do bootstrapping in a simple linear model.
# fit model
fit.lm = lm(formula = reaction ~ 1 + days,
data = df.sleep)
# coefficients
fit.lm %>% coef()
# bootstrapping
df.boot = df.sleep %>%
bootstrap(n = 100,
id = "id") %>%
mutate(fit = map(strap, ~ lm(formula = reaction ~ 1 + days, data = .)),
tidy = map(fit, tidy)) %>%
unnest(tidy) %>%
select(id, term, estimate) %>%
spread(term, estimate) %>%
clean_names()
(Intercept) days
252.32070 10.32766
Let’s illustrate the linear model with a confidence interval (making parametric assumptions using the t-distribution).
ggplot(data = df.sleep,
mapping = aes(x = days, y = reaction)) +
geom_smooth(method = "lm") +
geom_point(alpha = 0.3)
And let’s compare this with the different regression lines that we get out of our bootstrapped samples:
ggplot(data = df.sleep,
mapping = aes(x = days, y = reaction)) +
geom_abline(data = df.boot,
aes(intercept = intercept,
slope = days,
group = id),
alpha = 0.1) +
geom_point(alpha = 0.3)
For the linear mixed effects model, we can use the bootmer() function to do bootstrapping.
# fit the model
fit.lmer = lmer(formula = reaction ~ 1 + days + (1 + days | subject),
data = df.sleep)
# bootstrap parameter estimates
boot.lmer = bootMer(fit.lmer,
FUN = fixef,
nsim = 100)
# compute confidence interval
boot.ci(boot.lmer, index = 2, type = "perc")
# plot estimates
boot.lmer$t %>%
as_tibble() %>%
clean_names() %>%
mutate(id = 1:n()) %>%
gather("index", "value", - id) %>%
ggplot(data = .,
mapping = aes(x = value)) +
geom_density() +
facet_grid(cols = vars(index),
scales = "free") +
coord_cartesian(expand = F)
BOOTSTRAP CONFIDENCE INTERVAL CALCULATIONS
Based on 100 bootstrap replicates
CALL :
boot.ci(boot.out = boot.lmer, type = "perc", index = 2)
Intervals :
Level Percentile
95% ( 7.36, 13.52 )
Calculations and Intervals on Original Scale
Some percentile intervals may be unstable
We can use the “lmerTest” package to get p-values for the different fixed effects.
lmerTest::lmer(formula = reaction ~ 1 + days + (1 + days | subject),
data = df.sleep) %>%
summary()
Linear mixed model fit by REML. t-tests use Satterthwaite's method [
lmerModLmerTest]
Formula: reaction ~ 1 + days + (1 + days | subject)
Data: df.sleep
REML criterion at convergence: 1771.4
Scaled residuals:
Min 1Q Median 3Q Max
-3.9707 -0.4703 0.0276 0.4594 5.2009
Random effects:
Groups Name Variance Std.Dev. Corr
subject (Intercept) 582.73 24.140
days 35.03 5.919 0.07
Residual 649.36 25.483
Number of obs: 183, groups: subject, 20
Fixed effects:
Estimate Std. Error df t value Pr(>|t|)
(Intercept) 252.543 6.433 19.294 39.256 < 2e-16 ***
days 10.452 1.542 17.163 6.778 3.06e-06 ***
---
Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
Correlation of Fixed Effects:
(Intr)
days -0.137
Here is an overview of how to specify different kinds of linear mixed effects models.
| formula | description |
|---|---|
dv ~ x1 + (1 | g)
|
Random intercept for each level of g
|
dv ~ x1 + (0 + x1 | g)
|
Random slope for each level of g
|
dv ~ x1 + (x1 | g)
|
Correlated random slope and intercept for each level of g
|
dv ~ x1 + (x1 || g)
|
Uncorrelated random slope and intercept for each level of g
|
dv ~ x1 + (1 | school) + (1 | teacher)
|
Random intercept for each level of school and for each level of teacher (crossed)
|
dv ~ x1 + (1 | school/teacher)
|
Random intercept for each level of school and for each level of teacher in school (nested)
|
Information about this R session including which version of R was used, and what packages were loaded.
sessionInfo()
R version 3.5.2 (2018-12-20)
Platform: x86_64-apple-darwin15.6.0 (64-bit)
Running under: macOS Mojave 10.14.3
Matrix products: default
BLAS: /Library/Frameworks/R.framework/Versions/3.5/Resources/lib/libRblas.0.dylib
LAPACK: /Library/Frameworks/R.framework/Versions/3.5/Resources/lib/libRlapack.dylib
locale:
[1] en_US.UTF-8/en_US.UTF-8/en_US.UTF-8/C/en_US.UTF-8/en_US.UTF-8
attached base packages:
[1] stats graphics grDevices utils datasets methods base
other attached packages:
[1] forcats_0.3.0 stringr_1.4.0 dplyr_0.8.0.1 purrr_0.3.0
[5] readr_1.3.1 tidyr_0.8.2 tibble_2.0.1 ggplot2_3.1.0
[9] tidyverse_1.2.1 boot_1.3-20 modelr_0.1.3 lme4_1.1-20
[13] Matrix_1.2-15 patchwork_0.0.1 broom_0.5.1 janitor_1.1.1
[17] kableExtra_1.0.1 knitr_1.21
loaded via a namespace (and not attached):
[1] Rcpp_1.0.0 lubridate_1.7.4 lattice_0.20-38
[4] assertthat_0.2.0 digest_0.6.18 R6_2.4.0
[7] cellranger_1.1.0 plyr_1.8.4 backports_1.1.3
[10] evaluate_0.13 highr_0.7 httr_1.4.0
[13] pillar_1.3.1 rlang_0.3.1 lazyeval_0.2.1
[16] readxl_1.3.0 rstudioapi_0.9.0 minqa_1.2.4
[19] nloptr_1.2.1 rmarkdown_1.11 labeling_0.3
[22] splines_3.5.2 webshot_0.5.1 munsell_0.5.0
[25] compiler_3.5.2 numDeriv_2016.8-1 xfun_0.4
[28] pkgconfig_2.0.2 lmerTest_3.1-0 htmltools_0.3.6
[31] tidyselect_0.2.5 bookdown_0.9 viridisLite_0.3.0
[34] crayon_1.3.4 withr_2.1.2 MASS_7.3-51.1
[37] grid_3.5.2 nlme_3.1-137 jsonlite_1.6
[40] gtable_0.2.0 magrittr_1.5 scales_1.0.0
[43] cli_1.0.1 stringi_1.3.1 reshape2_1.4.3
[46] snakecase_0.9.2 xml2_1.2.0 generics_0.0.2
[49] tools_3.5.2 glue_1.3.0 hms_0.4.2
[52] yaml_2.2.0 colorspace_1.4-0 rvest_0.3.2
[55] haven_2.0.0
Bates, Douglas, and Martin Maechler. 2018. Matrix: Sparse and Dense Matrix Classes and Methods. https://CRAN.R-project.org/package=Matrix.
Bates, Douglas, Martin Maechler, Ben Bolker, and Steven Walker. 2019. Lme4: Linear Mixed-Effects Models Using ’Eigen’ and S4. https://CRAN.R-project.org/package=lme4.
Canty, Angelo, and Brian Ripley. 2017. Boot: Bootstrap Functions (Originally by Angelo Canty for S). https://CRAN.R-project.org/package=boot.
Firke, Sam. 2018. Janitor: Simple Tools for Examining and Cleaning Dirty Data. https://CRAN.R-project.org/package=janitor.
Henry, Lionel, and Hadley Wickham. 2019. Purrr: Functional Programming Tools. https://CRAN.R-project.org/package=purrr.
Müller, Kirill, and Hadley Wickham. 2019. Tibble: Simple Data Frames. https://CRAN.R-project.org/package=tibble.
Pedersen, Thomas Lin. 2017. Patchwork: The Composer of Ggplots. https://github.com/thomasp85/patchwork.
R Core Team. 2018. R: A Language and Environment for Statistical Computing. Vienna, Austria: R Foundation for Statistical Computing. https://www.R-project.org/.
Robinson, David, and Alex Hayes. 2018. Broom: Convert Statistical Analysis Objects into Tidy Tibbles. https://CRAN.R-project.org/package=broom.
Wickham, Hadley. 2017. Tidyverse: Easily Install and Load the ’Tidyverse’. https://CRAN.R-project.org/package=tidyverse.
———. 2018. Forcats: Tools for Working with Categorical Variables (Factors). https://CRAN.R-project.org/package=forcats.
———. 2019a. Modelr: Modelling Functions That Work with the Pipe. https://CRAN.R-project.org/package=modelr.
———. 2019b. Stringr: Simple, Consistent Wrappers for Common String Operations. https://CRAN.R-project.org/package=stringr.
Wickham, Hadley, Winston Chang, Lionel Henry, Thomas Lin Pedersen, Kohske Takahashi, Claus Wilke, and Kara Woo. 2018. Ggplot2: Create Elegant Data Visualisations Using the Grammar of Graphics. https://CRAN.R-project.org/package=ggplot2.
Wickham, Hadley, Romain François, Lionel Henry, and Kirill Müller. 2019. Dplyr: A Grammar of Data Manipulation. https://CRAN.R-project.org/package=dplyr.
Wickham, Hadley, and Lionel Henry. 2018. Tidyr: Easily Tidy Data with ’Spread()’ and ’Gather()’ Functions. https://CRAN.R-project.org/package=tidyr.
Wickham, Hadley, Jim Hester, and Romain Francois. 2018. Readr: Read Rectangular Text Data. https://CRAN.R-project.org/package=readr.
Xie, Yihui. 2018. Knitr: A General-Purpose Package for Dynamic Report Generation in R. https://CRAN.R-project.org/package=knitr.
Zhu, Hao. 2019. KableExtra: Construct Complex Table with ’Kable’ and Pipe Syntax. https://CRAN.R-project.org/package=kableExtra.